Support Vector Machines (SVM)
Overview:
SVM was used to assign potential functions for miRNAs on the basis of their expression patterns as described elsewhere (Brown et al. 2000). Briefly, SVM was used to generate independent probabilities for each miRNA belonging to one of 992 Gene Ontology (Ashburner et al. 2000) biological categories based on the expression patterns of the genes that are known to be in those categories. Missing values in the miRNA data (8%) were filled in with tissue medians, and data from each tissue was normalized to the variance of mRNAs in the same tissue. SVM was run using GIST v. 3.0 (Noble et al. 2004) with parameter settings “-radial -zeromeanrow -diagfactor 0.5”. Precision values were determined by three-fold cross-validation (Brown et al. 2000).
SVM
is a machine learning method that considers each GO functional category
separately, and which has previously been established to work well for predicting
gene functions in yeast on the basis of microarray expression data. The
SVM outputs can be processed to obtain an estimate of the probability that the
prediction for each gene in each category is correct (i.e.
"precision"), on the basis of how well previously-annotated genes in
the given category can be distinguished from previously-annotated genes that
are not in the category.
Calculation of Precision:
For each GO category, we constructed a table that associated discriminants value output by the SVM classifier to a precision value. This table was constructed using the discriminant values assigned to each labelled XM genes from (Zhang et al, 2004) during cross-validation. Specifically, we divided the range of discriminant values into one thousand equally sized regions. Each bin was assigned a threshold equal to the lower limit of the region. The precision associated with each bin was the ratio of the number of labelled genes in the category whose discriminant value was above the bin threshold to the total number of labelled genes with values above threshold.